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Abstract 

The value of a finite-state two-player zero-sum stochastic game with limit-average payoff can 
\Q ' be approximated to within e in time exponential in a polynomial in the size of the game times 

polynomial in logarithmic in ~, for all e > 0. 
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'. 1 Introduction 

o 

A zero-sum stochastic game is a repeated game over a finite state space, played by two players. 
Each player has a non-empty set of actions available at every state, and in each round, each player 
chooses an action from the set of available actions at the current state simultaneously with and 
independent from the other player. The transition function is probabilistic, and the next state is 
determined by a probability distribution depending on the current state and the actions chosen by 
the players. In each round, player 1 gets (and player 2 loses) a reward depending on the current 
state and the actions chosen by the players. The players are informed of the history of the play 
consisting of the sequence of states visited and the actions of the players played so far in the play. 
A strategy for a player is a recipe to extend the play: given a finite sequence of states and pairs of 
actions representing the history of the play, a strategy specifies a probability distribution over the 
set of available actions at the last state of the history. The limit-average for player 1 reward of a 
pair of strategies a and ir for player 1 and player 2, respectively, and a starting state s is defined as 
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where Xj is the random variable for the state reached at round i of the game, and 0jj is the 
random variable for the action played by player j at round i of the game, under strategies a and 
7T and starting state s, and r(s, a, b) gives the reward at state s for actions a and b. The form 
of the objective explains the term limit average. First the average is taken with respect to the 
expected rewards in the first n rounds of the game. Then the objective is defined as the liminf of 
these averages. A stochastic game with a limit-average reward is called a limit-average game. The 
fundamental question in stochastic games is the existence of a value, that is, whether 

sup inf vi(s,a,n) = inf sup vi(s, a, ir), 
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where £ and II denote the sets of strategies for player 1 and player 2, respectively. 

Stochastic games were introduced by Shapley [16], who showed the existence of a value in 
discounted games, where the game stops at each round with probability (3, for some < (3 < 1, and 
the goal of a player is to maximize the expectation of the total sum of the rewards. Limit- average 
games were introduced by Gillette [7], who studied the special cases of perfect information (in each 
round, at most one player has a choice of moves) and irreducible stochastic games. The existence 
of a value for the perfect information case was proved in [10]. Gillette's paper also introduced a 
limit-average game called the Big Match, which was solved in [5]. Bewley and Kohlberg [4] then 
showed how Pusieux series expansions can be used for the asymptotic analysis of discounted games. 
This, and the winning strategy in the Big Match, was used by Mertens and Neyman [11] to show 
the existence of a value in limit-average games. 

While the existence of a value in general limit-average stochastic games has been extensively 
studied, the computation of values has received less attention. 1 In general, it may happen that a 
game with rational rewards and rational transition probabilities still has an irrational value [15]. 
Hence, we can only hope to have approximation algorithms that compute the value of a game up to 
a given approximation e, for a real e > 0. Even the approximation of values is not simple, because 
in general limit- average games only admit ^-optimal strategies, for all reals rj > 0, rather than 
optimal strategies [5], and the 77-optimal strategies of [11] require infinite memory. This precludes, 
for example, common algorithmic techniques that enumerate over certain finite sets of strategies 
and, having fixed a strategy, solve the resulting Markov decision process using linear programming 
techniques [6]. Most research has therefore characterized particular subclasses of games for which 
stationary optimal strategies exist (a stationary strategy is independent of the history of a play 
and depends only on the current state) [14, 8] (see [6] for a survey), and the main algorithmic tool 
has been value or policy iteration, which can be shown to terminate in an exponential number of 
steps (but often behaves better in practice) for many of these particular classes. 

In this paper, we characterize the computational complexity of approximating the value of a 
limit-average game. We show that for any given real e > 0, the value of a game G at a state can 
be computed to within e-precision in time bounded by an exponential in a polynomial in the size 
of the game G times a polynomial function of log p This shows that approximating the value of 
limit-average games lies in the computational complexity class EXPTIME [13]. Our main technique 
is the characterization of values as semi-algebraic quantities [4, 11]. We show that for a real number 
a, whether the value of a stochastic limit-average game at a state s is strictly greater than a can 
be expressed as a sentence in the theory of real-closed fields. Moreover, this sentence is polynomial 
in the size of the game and has a constant number of quantifier alternations. The theory of real- 
closed fields is decidable in time exponential in the size of a formula and doubly exponential in the 
quantifier alternation depth [1]. This, together with binary search over the range of values, gives an 
algorithm exponential in polynomial in the size of the game graph times polynomial in logarithmic 
in - to approximate the value, for e > 0. Our techniques combine several known results to provide 
the first complexity bound on the general problem of approximating the value of stochastic games 
with limit-average objectives. It may be noted that the best known deterministic algorithm for the 
special case of perfect information limit-average games also requires exponential time. 

1 In this paper we take the classical view of computation, where an algorithm either answers "Yes" or "NO", or 
outputs a set of rational numbers 
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2 Definitions 



Probability distributions. For a finite set A, a probability distribution on A is a function 
5 : A — > [0, 1] such that J2 a eA <K a ) = 1- We- denote the set of probability distributions on A by 
£>(A). For a distribution <5 G ~D(A), we denote by Supp(<5) = {i S i > 0} the support of <5. 

Definition 1 (Stochastic games) A (two-player zero-sum) stochastic game G = 
(S, A,Ti,T2,5,r) consists of the following components. 

• A finite set S of states. 

• A finite set A of moves or actions. 

• Two move assignments T\, T2 : S — > 2^ \0. For z € {1, 2} ; assignment Tj associates with each 
state s £ S a non-empty set Ti(s) C A of moves available to player i at state s. 

• A probabilistic transition function 5 : S x Ax A — > P(S') i/iaf gives f/ie probability <5(s, a, b){t) 
of a transition from s to t when player 1 plays move a and player 2 plays move b, for all 
s,t £ S and a G Fi(s), b G r2(s). 

• A reward function r: SxAxA^R that maps every state and pair of moves to a real-valued 
reward. I 

The special class of perfect-information games can be obtained from stochastic games with the 
restriction that for all s G S either | Ti (s) | = 1 or |T2(s) | = 1, i.e., at every state at most one player 
can influence the transition. If the transition function 5 is deterministic rather than probabilistic 
then we call the game a deterministic game. The class of rational stochastic games are the special 
class of stochastic games such that all rewards and transition probabilities are rational. 

Size of a stochastic game. Given a stochastic game G we use the following notations: 
1. n = \S\ is the number of states; 

2- \$\ = XLeS l-^ 1 ( s )l ' ^(s)! is the number of entries of the transition function. 
Given a rational stochastic game we use the following notations: 

1. size(<5) = Ei&sEaer^E&eiys) a,b)(t)\, where \S(s,a,b)(t)\ denotes the space to ex- 
press 5(s,a,b)(t) in binary; 

2. size(r) = EseS Eaer\(s) E&gr 2 (s) l r ( s > a > ^)l> wnere \ r ( s i a ib)\ denotes the space to express 
r(s, a, b) in binary; 

3. \G\ = size(G) = size(5) + size(r). 

The specification of a game G requires 0(|G|) bits. Given a stochastic game with n states, we assume 
without loss of generality that the state space of the stochastic game structure is enumerated as 
natural numbers, S = { 1, 2, . . . , n }, i.e., the states are numbered from 1 to n. 

At every state s G S, player 1 chooses a move a G Fi(s), and simultaneously and independently 
player 2 chooses a move 6 G r 2 (s). The game then proceeds to the successor state t with probability 
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5(s,a,b)(t), for all t G S. At the state s, for moves a for player 1 and b for player 2, player 1 wins 
and player 2 loses a reward of value r(s, a, b). 

A path or a p/ay to of G is an infinite sequence w = (so, (ao> ^o)> sij (ai, &i), s 2 , (a 2 , 62), . . .) of 
states and pairs of moves such that (aj,bj) G ri(sj) x r2(sj) and Sj+i € Supp(<5(sj, a^, for all 
z > 0. We denote by Q the set of all paths, and by fi s the set of all paths starting from state s. 

Randomized strategies. A strategy for player 1 is a function a : (S x A x ^4)* • S — > D(A) 
that associates with every prefix of a play, representing the history of the play so far, and the 
current state a probability distribution from V(A) such that for all w G (S x A x A)* and all 
s G 5, we have Supp(<7(«; • s)) C ri(s). Observe that the strategies can be randomized (i.e., not 
necessarily deterministic) and history-dependent (i.e., not necessarily stationary). Similarly we 
define strategies tt for player 2. We denote by S and II the sets of strategies for player 1 and player 
2, respectively. 

Once the starting state s and the strategies a and tt for the two players have been chosen, the 
game is reduced to a stochastic process. Hence, the probabilities of events are uniquely defined, 
where an event 6 C Q, s is a measurable set of paths. For an event 8 C f2 s , we denote by Pr^' 1T (£') 
the probability that a path belongs to £ when the game starts from s and the players follow the 
strategies a and tt. We denote by Es'^f-] the associated expectation operator with the probability 
measure Prg' 71 "^). For % > 0, we denote by Xj : — ► S the random variable denoting the i-th state 
along a path, and for j G {1,2}, we denote by Ojj : f2 s — > A the random variable denoting the 
move of player j in the i-th round of a play. 

Limit-average payoff. Let a and tt be strategies of player 1 and player 2, respectively. The 
limit- average payoff vi(s,a, tt) for player 1 at a state s, for the strategies a and tt, is defined as 



v±(s, a, tt) = Eg ,7T lim inf 
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Similarly, for player 2, the payoff ^(s, c> tt) is defined as 

N 
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u 2 (s, a, tt) = lim sup — ■ ^ -r(Xj, O^i, 9 i)2 



j=i 



In other words, player 1 wins and player 2 looses the "long-run" average of the rewards of the play. 
A stochastic game G with limit-average payoff is called a stochastic limit-average game. 

Given a state s G S and we are interested in finding the maximal payoff that player 1 can 
ensure against all strategies for player 2, and the maximal payoff that player 2 can ensure against 
all strategies for player 1. We call such payoff the value of the game G at s for player i £ { 1,2}. 
The values for player 1 and player 2 are defined for all s G S by 

v i(s) = sup CTgS hnvgn vi(s, a, tt) and v 2 (s) = sup^n inf CTeS v 2 (s, a, tt). 

Mertens and Neyman [11] established the determinacy of stochastic limit-average games. 

Theorem 1 [11] For all stochastic limit- average games G and for all states s of G, we have 
v\(s) + v 2 (s) = 0. 
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Stronger notion of existence of values [11]. The values for stochastic limit-average games 
exist in a strong sense: for all reals e > 0, there exist strategies a* G T,,ir* G II such that the 
following conditions hold: 



1. for all a G X and tt G II, we have 



-e + E^ w lim sup 
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< l7r lim inf 



1 N 

-.^2^,0^,2) 



i=l 



+ e; (1) 



2. there exists an integer Nq such that for all a G £ and tt G II, for all integers N > Nq, we have 

N -, r , N 



-e + E° 



i=i 



(2) 



3. there exists < /3q < 1 such that for all <r G X and 7r G II, for all < (3 < (3q, we have 



-e + E' 



<J,TT 



^■^(l-rMx.e.i.e^) 



< E^ ^ 



^•^(l-^rMX,, 9^,9^,2) 



+ e. (3) 



Let vi(s,a,ir) = E^' 71 " lim sup 

AT -too 

equality: 



1 N 



i=l 



, then (1) is equivalent to the following 



sup inf vi(s,a, tt) = inf sup vi(s, a, tt). 



3 Theory of Real-closed Fields and Quantifier Elimination 

9ur main technique is to represent the value of a game as a formula in the theory of real-closed fields. 
We denote by R the real-closed field (M, +, •, 0, 1, <) of the reals with addition and multiplication. 
In the sequel we write "real-closed field" to denote the real-closed field R. An atomic formula 
is an expression of the form p < or p = 0, where p is a (possibly) multi-variate polynomial 
with coefficients in the real-closed field. Coefficients are rationals or symbolic constants (e.g., the 
symbolic constant e stands for 2.71828. . .). We will consider the special case when only rational 
coefficients of the form 21, where (71,(72 are integers, are allowed. A formula is constructed from 
atomic formulas by the grammar 

if ::= a \ \ ip f\ Lp \ ip\J ip \ Bx.cp \ \/x.ip, 

where a is an atomic formula, -ia denotes complement of a, (f \ A <f2 denotes conjunction of ifi 
and if2, (fi V <f2 denotes disjunction of y>\ and (f2, and 3 and V denote existential and universal 
quantification, respectively. We use the standard abbreviations such as p < 0,p > and p > that 
are derived as follows: 

p<0 (for p < Vp = 0), p>0 (for -.(p < 0)), and p>0 (for < 0)). 

The semantics of formulas are given in a standard way. A variable x is free in the formula ip if it is 
not in the scope of a quantifier 3x or Vx. A sentence is a formula with no free variables. A formula 
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is quantifier-free if it does not contain any existential or universal quantifier. Two formulas 931 and 
(f2 are equivalent if the set of free variables of ip\ and <p2 are the same, and for every assignment 
to the free variables the formula tpi is true if and only if the formula ip2 is true. A formula ip 
admits quantifier elimination if there is an algorithm to convert it to an equivalent quantifier-free 
formula. A quantifier elimination algorithm takes as input a formula <p and returns an equivalent 
quantifier-free formula, if one exists. 

Tarski proved that every formula in the theory of real-closed fields admits quantifier elimination, 
and (by way of quantifier elimination) that there is an algorithm to decide the truth of a sentence 
<p in the theory of real-closed fields (see [18] for algorithms that decide the truth of a sentence ip 
in the theory of real-closed fields). The complexity of the algorithm of Tarski has subsequently 
improved, and we now present a result of Basu [1] on the complexity of quantifier elimination for 
formulas in the theory of the real-closed field. 

Complexity of quantifier elimination. We first define the length of a formula 92, and then 
define the size of a formula with rational coefficients. We denote the length and size of ip as \&n(<p>) 
and size(yj), respectively. The length of a polynomial p is defined as the sum of the length of its 
constituent monomials plus the number of monomials in the polynomial. The length of a monomial 
is defined as its degree plus the number of variables plus 1 (for the coefficient). For example, for 
the monomial \ ■ x 3 ■ y 2 ■ z, its length is 6 + 3 + 1 = 10. Given a polynomial p, the length of both 
p < and p = is len(p) + 2. This defines the length of an atomic formula a. The length of a 
formula ip is inductively defined as follows: 



Observe that the length of a formula is defined for formulas that may contain symbolic constants 
as coefficients. For formulas with rational coefficients we define its size as follows: the size of ip, i.e., 
size(y)), is defined as the sum of len(^) and the space required to specify the rational coefficients 
of the polynomials appearing in ip in binary. We state a result of Basu [1] on the complexity 
of quantifier elimination for the real-closed field. The following theorem is a specialization of 
Theorem 1 of [1]; also see Theorem 14.14 and Theorem 14.16 of [2]. 

Theorem 2 [1] Let d, k, m be nonnegative integers, X = { X±, X2, ■ ■ ■ , } be a set of k variables, 
and V = { pi,P2, ■ ■ ■ ,Pm } be a set of m polynomials over the set X of variables, each of degree at 
most d and with coefficients in the real-closed field. Let Xi r i,Xi r _ii, . . . , Xm denote a partition of 
the set X of variables into r subsets such that the set X^ of variables has size k{ , i.e., k% — |A^i| 
and Yli=i h = k. Let 

$ = (Q r X [r] ). (Q r _iX [r _ 1] ). ••• .(Q 2 X [2 ]). (QiXft). <p(pi,P2,---,Pm) 

be a sentence with r alternating quantifiers Qi £ { 3, V } (i.e., Qi+i 7^ Q%), and ip(pi,P2, ■ ■ ■ ,Pm) is 
a quantifier-free formula with atomic formulas of the form pi x 0, where 00 €{<,>, = }. Let D 
denote the ring generated by the coefficients of the polynomials in V. Then the following assertions 
hold. 



len(-ia) 
len(<£>i A ip2) 
len(</?i V ip 2 ) 
\en(3x.ip) 
\en(Vx.ip) 



len(a) + 1; 

len(c/9i) + \en(ip 2 ) + 1; 
len((/9i) + \en(ip 2 ) + 1; 



\en(ip) + 2; 
\&n(tp) + 2. 
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1. There is an algorithm to decide the truth o/<3? using 

m IL(*i+i) .dUiOiki) .| en( ^ 

arithmetic operations (multiplication, addition, and sign determination) inD. 

2. If D = Z (the set of integers) and the bit sizes of the coefficients of the polynomials are 
bounded by 7, then the bit sizes of the integers appearing in the intermediate computations of 
the truth of <3? is bounded by 

7 . d n i O(fc i ) 

The result of part 1 of Theorem 2 holds for sentences with symbolic constants as coefficients. 
The result of part 2 of Theorem 2 is for the special case of sentences with only integer coefficients. 
Part 2 of Theorem 2 follows from the results of [1], but is not explicitly stated as a theorem there; 
for an explicit statement as a theorem, see Theorem 14.14 and Theorem 14.16 of [2]. 

Remark 1 Given two integers a and b, let \a\ and \b\ denote the space to express a and b in binary, 
respectively. The following assertions hold: given integers a and b, 

1. given signs of a and b, the sign determination of a + b can be done in 0(\a\ + time, i.e., in 
linear time, and the sign determination ofa-b can be done 0(1) time, i.e., in constant time; 

2. addition of a and b can be done in 0(\a\ + time, i.e., in linear time; and 

3. multiplication of a and b can be done in 0(\a\ ■ \b\) time, i.e., in quadratic time. 

It follows from the above observations, along with Theorem 2, that if D = Z and the bit sizes of 
the coefficients of the polynomials appearing in $ are bounded by 7, then the truth of can be 
determined in time 

mXliOiki+i) . ^(h) . 0( | en ^) . 7 2)_ (4) 
4 Computation of Values in Stochastic Games 

The values in stochastic limit-average games can be irrational even if all rewards and transition 
probability values are rational [15]. Hence, we can algorithmically only approximate the values to 
within a precision e, for e > 0. 

Discounted value functions. Let G be a stochastic game with reward function r. For a real (3, 
with < (3 < 1, the /^-discounted value function vf is defined as follows: 

00 

4 (s) = sup inf P ■ Er [Y> - (3f ■ r(Xi, Q^, 9 i)2 )] . 

For a stochastic game G, the /3-discounted value function vf is monotonic with respect to (3 in a 
neighborhood of [4]. 
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4.1 Sentence for the value of a stochastic game 

We now describe how we can obtain a sentence in the theory of the real-closed field that states that 
the value of a stochastic limit-average game at a given state is strictly greater than a, for a real a. 
The sentence applies to the case where the rewards and the transition probabilities are specified as 
symbolic or rational constants. 

Formula for /3-discounted value functions. Given a real a and a stochastic limit-average game 
G, we present a formula in the theory of the real-closed field to express that the /3-discounted value 
vf(s) at a given state s is strictly greater than a, for < j3 < 1. A valuation v £ K n is a vector 
of reals, and for 1 < i < n, the i-th component of v represents the value v(i) for state i. For every 
state s £ S and for every move b £ F 2 (s) we define a polynomial u^^^ for player 1 as a function 
of x £ V(Fi(s)), a valuation v and < (3 < 1 as follows: 

U( s ,b,i)(x,v,P) = (3 ■ x(a) -r(s,a,b) + (1 - (3) ■ ^ x(a) ■ ^ S(s, a, b)(t) ■ v(t) - v(s). 

aeri (s) aeri (s) *es 

The polynomial W(s )C) ,i) consists of the variables (3, and x(a) for a £ and v(t) for t £ S. 

Observe that given a stochastic limit-average game, r(s,a,b) for a £ Ti(s), and 5(s,a, b)(t) for 
t £ S and a £ Ti(s) are rational or symbolic constants given by the game graph, not variables. 
The coefficients of the polynomial are r(s,a,b) for a £ ri(s), and 5(s,a, b)(t) for a £ Ti(s) and 
t £ S. Hence the polynomial has degree 3 and has 1 + |Ti (s) | + n variables. Similarly, for s £ S, 
a £ Ti(s), y £ V(T2(s)), v £ W 1 , and < (3 < 1, we have polynomials ui s ^ 2 ) defined by 

u {s , a ,2)(y,v,P) = f3- y(b)-r(s,a,b) + (l-f3)- £ y(b) ■ ^ S(s, a, b)(t) ■ v(t) - v(s). 

ber 2 (s) £>er 2 (s) tes 

The sentence stating that ff (s) is strictly greater than a is as follows. We have variables x s (a) for 
s £ S and a £ Ti(s), y s (b) for s £ S and b £ ^(s), and variables t>(l),t>(2), . . . ,v(n). For simplicity 
we write x s for the vector of variables x s (ai), x s (a2), ■ ■ ■ ,x s (aj), where Ti(s) = { ai,a2, . . . ,aj }, 
y s for the vector of variables y s (bi), y s {b-2), ... , y s (bi), where ^(s) = { bi, 62, . . . , bi }, and v for the 
vector of variables v(l),v(2), . . . , v (n). The sentence is as follows: 

$p(s,a) = 3x 1 ,...,x n .3y 1 ,...,y n .3v. ^(x 1 ,x 2 , ■ ■ ■ ,x n ,y 1 ,y 2 , ■ ■ ■ ,y n ) 

A A (u (sM) {x s ,v,(3)>0) A A { u {s,a,2){y s ,v,l3) < 0) 

ses , ,6er 2 (s) se5,aeri(s) 
A (v(a)-a>0); 

where *(xi, x 2 , . . . , x n , 2/1,2/2, • • • , 2/n) specify the constraints that x±, x 2 , . . . , x n and yi,y 2 ,...,y n 
are valid randomized strategies and is defined as follows: 

^(x 1 ,x 2 ,...,x n ,y 1 ,y 2 ,...,y n ) = f\ (( ^ x s (a))-l = 0) A A > °) 

ses 1 aeri(s) ses,aeri(s) 

A A (( E y.(6))-i = o) a a M & )>°)- 

ses 1 6er 2 (s) seS'.feeraCs) 

The total number of polynomials in ^(s,a) is 1 + E se s( 3 ■ l r i( s )l + 3 ' |T 2 («) | + 2) = 0(\5\). In 
the above formula we treat (3 as a variable; it is a free variable in ^(^a). Given a stochastic 
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limit-average game G, for all < (3 < 1, the correctness of <f>p(s,a) to specify that ff(s) > a can 
be proved from the results of [16]. 

Value of a game as limit of discounted games. The result of Mertens-Neyman [11] established 
that the value of a stochastic limit-average game is the limit of the /3-discounted values, as (3 goes 
to 0. Formally, we have 

v\(s) = lim v?(s). 
/3->0+ 

Sentence for the value of a stochastic game. From the characterization of the value of a 
stochastic limit-average game as the limit of the /3-discounted values and the monotonicity property 
of the /3-discounted values in a neighborhood of 0, we obtain the following sentence <3?(s, a) stating 
that the value at state s is strictly greater than a. In addition to variables for $p(s,a), we have 
the variables (3 and f3\. The sentence Q(s,a) specifies the expression 

3/?!>0. V/?€(0,/3i). $p(s,a), 

and is defined as follows: 
$(s,a) = 3(3 1 .V(3.3x 1 ,...,x n .3y 1 ,...,y n .3v. x 2 , ■ ■ ■ , x n , y u y 2 , ■ ■ ■ , y n ) 

A (/3i>0)A (Pi-P<0) V(/5<0) V ((/?i-/5>0) 

A A {u {sA1) (x s ,v,P)>0) 
ses,ber 2 (s) 

A A ("Mfc-"'^ ) 

A («(a)-a>0); 

where x 2 , . . . , x n , 2/1,2/2, ■■■ ,Un) specify the constraints that xi, x 2 , . . . , x n and y\, y 2 , . . . , y n 

are valid randomized strategies (the same formula used for &p(s, a)). 2 Observe that a) contains 
no free variable (i.e., the variables x s , y s , v, (3±, and (3 are quantified). A similar sentence was used 
in [4] for values of discounted games. The total number of polynomials in <J>(s,a) is 0(|5|); in 
addition to the 0(|<5|) polynomials of $>p(s,a) there are 4 more polynomials in &(s,a). In the 
setting of Theorem 2 we obtain the following bounds for $(s, a): 

m = 0(\5\); k = 0(\5\); [J(^ + 1) = 0(\S\); r = 0(1); d = 3; (5) 

i 

and hence we have 



m IL(*i+i) . o(ki) = 0(|<5|) 0(|5|) = 2°(i 5 i' log (i 5 i)) . 



Also observe that for a stochastic game G, the sum of the lengths of the polynomials appearing in 
the sentence is 0(|<5|). The present analysis along with Theorem 2 yields Theorem 3. The result of 
Theorem 3 holds for stochastic limit-average games where the transition probabilities and rewards 
are specified as symbolic constants. 



2 Our detailed formulas $p(s,a) and $(s, a) can be shortened, however, the present formulas make it easier to 
understand the bound on parameters required for complexity bounds. 
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Theorem 3 Given a stochastic limit-average game G with reward function r, a state s of G, and 

a real a, there is an algorithm to decide whether V\(s) > a using 2°0 5 '' log (' 5 ')) • 0(|<5|) arithmetic 
operations (addition, multiplication, and sign determination) in the ring generated by the set 

{r(s,a,b) | s G S> G Ti(s), b G r 2 (s) } U { S(s,a,b)(t) \s,t<ES,a<E Ti(s),b G T 2 {s) } U { a }. 



4.2 Algorithmic analysis 

For algorithmic analysis we consider rational stochastic games, i.e., stochastic games such that 
r(s,a,b) and 5(s,a, b){t) are rational for all states s,t G S, and moves a G T\(s) and b G ^(s). In 
the sequel we will only consider rational stochastic games. Given the sentence &(s,a) to specify 
that vi(s) > a, we first reduce it to an equivalent sentence &(s,a) as follows. 

• For every rational coefficient i = 21, where qi,q2 G Z, appearing in $(s,a) we apply the 
following procedure: 

1. introduce a new variable zf, 

2. replace t by zg in <£(s, a); 

3. add a polynomial c/2 • zg — qi = as a conjunct to the quantifier- free body of the formula; 
and 

4. existentially quantify Z£ in the block of existential quantifiers after quantifying (3\ and 
(3. 

Thus we add 0(|5|) variables and polynomials, and increase the degree of the polynomials in &(s, a) 
by 1. Also observe that the coefficients in 3>(s, a) are integers, and hence the ring D generated by 
the coefficients in $(s, a) is Z. Similar to the bounds obtained in (5), in the setting of Theorem 2 
we obtain the following bounds for <E>(s, a): 

m = 0(\S\); k = 0(\S\); JJ(^ + 1) = 0(|«5|); 9=0(1); d = 4; 

i 

and hence 

fcYliOiki+i) .jTltOfy =0 qs\)°(\ 5 U = 2°(l' 5 l- lo g(l' 5 l)). 

Also observe that the length of the sentence $(s,a) can be bounded by 0(|5|), and the sum of 
the bit sizes of the coefficients in <&(s,a) can be bounded by 0(\G\ + \a\), where \a\ is the space 
required to express a in binary. This along with (4) of Remark 1 yields the following result. 

Theorem 4 Given a rational stochastic limit-average game G, a state s of G, and a rational a, 
there is an algorithm that decides whether v\ (s) > a in time 

2 o(|5|.log(|«5|)) . om . 0( | G |2 + | a[2) = 2 oO|.log(|5|)) . 0( | G |2 + | a | 2) _ 



10 



+ , _ ^ _ r(s,a,b) +M 



4.3 Approximating the value of a stochastic game 

We now present an algorithm that approximates the value v\(s) within a tolerance of e > 0. 
The algorithm (Algorithm 1) is obtained by a binary search technique along with the result of 
Theorem 4. Algorithm 1 works for the special case of normalized rational stochastic games. We 
first define normalized rational stochastic games and then present a reduction of rational stochastic 
games to normalized rational stochastic games. 

Normalized rational stochastic games. A rational stochastic game is normalized if the reward 
function satisfies the following two conditions: (1) min{ r(s, a,b) \ s £ S, a £ Ti(s), b £ ^(s) } > 0; 
and (2) max{ r(s, a,b) \ s £ S,a £ Ti(s), b £ T 2 (s) } < 1. 

Reduction. We now present a reduction of rational stochastic games to normalized rational 
stochastic games, such that by approximating the values of normalized rational stochastic games we 
can approximate the values of rational stochastic games. Given a reward function r : SxAxA — > R, 
let 

M = max{ abs(r(s, a, b)) \ s £ S, a £ Ti(s), b £ T 2 (s) }, 

where abs(r(s, a, b)) denotes the absolute value of r(s,a,b). Without loss of generality we assume 
M > 0. Otherwise, r(s,a,b) = for all states s £ S, and moves a £ Ti(s) and b £ ^(s), and 
hence v±(s) = for all states s £ S (i.e., the value function can be trivially computed). Consider 
the reward function r + : S x A x A — > [0, 1] defined as follows: for s £ S, a £ Ti(s), and b £ ^(s), 
we have 

r + (s, a,b) = 

The reward function r + is normalized and the following assertion hold. Let v\ and v+ denote the 
value functions for the reward functions r and r + , respectively. Then for all states s £ S we have 

Vl [S) ~ 2M ■ 
Hence it follows that for rationals a, I, and u, such that I < u, we have 

tti(s) > a iSvf(s) > a ^ " ; and u+(a) £ [I, u] iff Vl (s) £ [M ■ (21 - 1), M ■ (2u - 1)]. 

Given a rational e > 0, to obtain an interval [/i,iti] such that u± — 1\ < e and vi(s) £ [li,u±], we first 
obtain an interval [I, u] such that u — l< ^fj and vf(s) £ [l,u]. From the interval [I, u] we obtain the 
interval [Zi,iti] = [M ■ (21 - 1), M ■ (2u - 1)] such that v^s) £ [Zi,«i] and m - h = 2 ■ M ■ (u- 1) < e. 
Hence we present the algorithm to approximate the values for normalized rational stochastic games. 

Running time of Algorithm 1. In Algorithm 1 we denote by <&(s,m) the sentence to specify 
that v\(s) > m, and by Theorem 4 the truth of $>(s,m) can be decided in time 

2 o(|5|-log(|5|)) . 0(|G| 2 + | m | 2)> 

for a stochastic game G, where \m\ is the number of bits required to specify m. In Algorithm 1, 
the variables / and u are initially set to and 1, respectively. Since the game is normalized, the 
initial values of I and u clearly provide lower and upper bounds on the value, and provide starting 
bounds for the binary search. In each iteration of the algorithm, in Steps 2.1.1 and 2.2.1, there is 
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Algorithm 1 Approximating the value of a stochastic game 



Input: a normalized rational stochastic limit-average game G, 

a state s of G, and a rational value e > specifying the desired tolerance. 
Output: a rational interval [l,u] such that u — I < 2e and vi(s) € [Z, u]. 

1. I := 0; u := 1; m = ^; 

2. repeat for [log (^)] steps 

2.1. if $(s,m), then 

2.1.1. I :=m; u := u; m := 

2.2. else 

2.2.1. / := /; u := m; m := 

3. return [/, u]; 



a division by 2. It follows that after i iterations l,u, and m can be expressed as where q is an 
integer and q<2 % . Hence l,u, and m can always be expressed in 

0(log(^)) 

bits. The loop in Step 4 runs for [log (j)] = O(logQ)) iterations, and every iteration can be 
computed in time 2°( |5| log(l<5|) ) • 0(\G\ 2 + log 2 This gives the following theorem. 

Theorem 5 Given a normalized rational stochastic limit-average game G, a state s of G, and a 
rational e > 0, Algorithm 1 computes an interval [l,u] such that v\(s) € [l,u] and u — I < 2e, in 
time 




The reduction from rational stochastic games to normalized stochastic games suggest that for 
a rational stochastic game G and a rational tolerance e > 0, to obtain an interval of length at 
most e that contains the value vi(s), it suffices to obtain an interval of length of at most gfj that 
contains the value in the corresponding normalized game, where M = max{ abs(r(s, a, b)) \ s € 
S,a G Fi(s),b G T2(s) }. Since M can be expressed in |G| bits, it follows that the size of the 
normalized game is 0(|G| 2 ). Given a tolerance e > for the rational stochastic game, we need 
to consider the tolerance for the normalized game. The above analysis along with Theorem 5 
yields the following corollary (the corollary is obtained from Theorem 5 by substituting \G\ by |G| 2 , 
and log Q) by |G|-logQ)). 

Corollary 1 Given a rational stochastic limit-average game G, a state s of G, and a rational 
e > 0, an interval [l,u] such that vi(s) £ [l,u] and u — I < 2e, can be computed in time 
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The complexity class EXPTIME. A problem is in the complexity class EXP TIME [13] if there 
is an algorithm A that solves the problem, and there is a polynomial p(-) such that for all inputs / 
of |/| bits, the running time of the algorithm A on input / can be bounded by 2°WI / D) . In case of 
rational stochastic limit-average games, the input is the size of the game G, i.e., the input requires 
\G\ bits. Hence from Theorem 4 and Corollary 1 we obtain the following result. 

Theorem 6 Given a rational stochastic limit- average game G, a state s of G, rational e > 0, and 
rational a, the following assertions hold. 

1. (Decision problem) Whether v\{s) > a can be decided in EXPTIME. 

2. (Approximation problem) An interval [l,u] such that u — I < 2e and v±(s) G [l,u] can be 
computed in EXPTIME. 

Approximate analysis of games with approximate description. Let G = (S, A, Fx, F2, S, r) 

and G' = (S, A, F\, F2, 6', r') be two stochastic games such that 

1. for all s,t G S and for all a G F\(s) and b G ^(s), we have 

S(s,a,b) < (1 + 77) ■ S'(s,a,b)(t) and S'(s, a, b) < (1 + rj) • S(s, a, b)(t), 
for r] < 2jij| ; and 

2. for all s G S and for all a G Fi(s) and all b G ^(s) we have 

abs(r(s, a, b) — r'(s, a, b)) < 7. 

Let p(G,G') be defined as the infimum over ( ^^^s l ) ' H r ll +t)> where 77,7 ranges over all pairs 
that satisfy the above two inequalities. From the result of [17] it follows that the absolute difference 
in the values of a player at all states in G and G' is bounded by p(G, G'). Hence given a game G 
and an auxiliary game G' that approximates G within t], i.e., p(G,G') < rj, we can approximate 
the values of the game G' for e > 0, and obtain a rj + e approximation of the values of the game G. 
This enables us to approximate the values of stochastic games described approximately. 

Unfortunately, the only lower bound we know on the complexity of the decision problem is 
PTIME-hardness (polynomial-time hardness). The hardness follows from a reduction from alter- 
nating reachability [3, 9]. Even for the simpler case of perfect-information deterministic games, no 
polynomial time algorithm is known [19], and the best known deterministic algorithm for perfect 
information games is exponential in the size of the game. In case of perfect-information stochastic 
games, deterministic and stationary optimal strategies exist [10]. Since the number of deterministic 
stationary strategies can be at most exponential in the size of the game, there is an exponential 
time algorithm to compute the values exactly (not approximately) (also see the survey [12]). From 
the polynomial time algorithm to compute values in Markov decision processes [6] and the existence 
of pure stationary optimal strategies in perfect-information games [10], it follows that the decision 
problem for perfect-information games lie in NP n coNP. Better complexity bounds than EXP- 
TIME to solve the decision and the approximation problem for stochastic games is an interesting 
open problem; and even for deterministic games no better bound is known. 
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